测量金属粉的纯度对于保留添加剂制造产品的质量至关重要。污染是最头痛的问题之一,可能是由于多种原因引起的,并导致造成的成分破裂和故障。冶金条件评估的现有方法主要是耗时的,主要集中于结构的物理完整性,而不是材料组成。通过捕获广泛频率范围的光谱数据以及空间信息,高光谱成像(HSI)可以检测到温度,水分和化学成分方面的较小差异。因此,HSI可以提供一种应对这一挑战的独特方法。在本文中,通过使用近红外HSI相机,引入了HSI用于非破坏性检查金属粉末的应用。详细介绍了三个分步案例研究的技术假设和解决方案,包括粉末表征,污染检测和带选择分析。实验结果已经完全证明了HSI和相关的AI技术对粉末冶金的NDT的潜力,尤其是满足工业制造环境的潜力。
translated by 谷歌翻译
作为非遗迹渲染(NPR)的主要分支,图像样式主要使用计算机算法将照片渲染为艺术绘画。最近的工作表明,样式信息的提取,例如笔触纹理和目标样式图像的颜色是图像风格的关键。鉴于其中风质地和颜色特征,提出了一种新的中风渲染方法,该方法完全考虑了音调特征和原始油画的代表性,以便将原始油画图像的音调适应风格化的图像并制作它接近艺术家的创造性效果。实验验证了所提出模型的功效。这种方法更适合具有相对均匀的方向意识的点尔主义画家的作品,尤其是对于自然场景。当原始绘画笔触具有更清晰的方向感时,使用此方法模拟刷子纹理特征可能会不那么令人满意。
translated by 谷歌翻译
域自适应文本分类对于大规模预处理的语言模型来说是一个具有挑战性的问题,因为它们通常需要昂贵的额外标记数据来适应新域。现有作品通常无法利用跨域单词之间的隐式关系。在本文中,我们提出了一种新的方法,称为结构化知识(DASK)的域适应性,以通过利用单词级别的语义关系来增强域的适应性。 Dask首先构建知识图,以捕获目标域中的枢轴项(独立域单词)和非居式项之间的关系。然后在训练期间,DASK注入与源域文本的枢轴相关知识图信息。对于下游任务,这些注入知识的文本被馈入能够处理知识注入文本数据的BERT变体。多亏了知识注入,我们的模型根据与枢轴的关系学习了非客者的域不变特征。 DASK通过在使用伪标签训练期间通过候选枢轴的极性得分动态推断出具有域不变行为的枢轴。我们在各种跨域情绪分类任务上验证了DASK,并观察到20种不同领域对的基准的绝对性能提高了2.9%。代码将在https://github.com/hikaru-nara/dask上提供。
translated by 谷歌翻译
通过捕获来自宽频率范围的光谱数据以及空间信息,高光谱成像(HSI)可以检测温度,水分和化学成分方面的微小差异。因此,HSI已成功应用于各种应用,包括遥感安全和防御,植被和作物监测,食品/饮料和药品质量控制的精密农业。然而,对于碳纤维增强聚合物(CFRP)中的病症监测和损伤检测,HSI的使用是一个相对未受破坏的区域,因为现有的非破坏性测试(NDT)技术主要集中在提供有关结构的物理完整性但不对的信息材料组成。为此,HSI可以提供一种独特的方法来解决这一挑战。在本文中,通过使用近红外HSI相机,介绍了HSI对CFRP产品的非破坏性检查的应用,以EU H2020 FibreeUSE项目为背景。详细介绍了三种案例研究的技术挑战和解决方案,包括粘合剂残留检测,表面损伤检测和基于COBOT的自动检查。实验结果充分展示了HSI的巨大潜力和CFRP的NDT的相关视觉技术,特别是满足工业制造环境的潜力。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Learning feature interactions is the key to success for the large-scale CTR prediction and recommendation. In practice, handcrafted feature engineering usually requires exhaustive searching. In order to reduce the high cost of human efforts in feature engineering, researchers propose several deep neural networks (DNN)-based approaches to learn the feature interactions in an end-to-end fashion. However, existing methods either do not learn both vector-wise interactions and bit-wise interactions simultaneously, or fail to combine them in a controllable manner. In this paper, we propose a new model, xDeepInt, based on a novel network architecture called polynomial interaction network (PIN) which learns higher-order vector-wise interactions recursively. By integrating subspace-crossing mechanism, we enable xDeepInt to balance the mixture of vector-wise and bit-wise feature interactions at a bounded order. Based on the network architecture, we customize a combined optimization strategy to conduct feature selection and interaction selection. We implement the proposed model and evaluate the model performance on three real-world datasets. Our experiment results demonstrate the efficacy and effectiveness of xDeepInt over state-of-the-art models. We open-source the TensorFlow implementation of xDeepInt: https://github.com/yanyachen/xDeepInt.
translated by 谷歌翻译
In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译